home *** CD-ROM | disk | FTP | other *** search
Text File | 2001-08-18 | 41.8 KB | 1,219 lines |
- <!DOCTYPE PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
- <html>
- <head>
- <title>Clean up your Web pages with HTML TIDY</title>
- <meta name="keywords" content=
- "HTML, validation, error correction, pretty-printing">
- <meta name="author" content="Dave Raggett <dsr@w3.org>">
- <style>
- body {
- margin-left: 10%;
- margin-right: 10%;
- font-family: sans-serif
- }
- h1 { margin-left: -8% }
- h2,h3,h4,h5,h6 { margin-left: -4% }
- pre { color: green; font-weight: bold; font-family: monospace}
- em { font-style: italic; color: rgb(0, 0, 153) }
- strong { text-transform: uppercase; font-weight: bold }
- .note {font-style: italic; color: rgb(192, 101, 101) }
- //hr {text-align: center; width: 60% }
- blockquote {
- color: navy;
- font-family: "Comic Sans MS", "Times New Roman", serif
- }
- blockquote.people { text-align: center; }
- table {
- font-family: sans-serif;
- font-size: 80%;
- background: rgb(255,255,153)
- }
- td {
- font-size: 80%
- }
- .people {font-family: "Lucida Calligraphy", serif}
- :link { color: rgb(0, 0, 153) }
- :visited { color: rgb(153, 0, 153) }
- :active { color: rgb(255, 0, 102) }
- :hover { color: rgb(0, 0, 255) }
- </style>
- </head>
- <body bgcolor="#FFFFFF" background="data/grid.gif" text="black" link=
- "navy" vlink="black" alink="red">
- <h1 align="center"><img src="data/tidy.gif" width="32" height="32"
- align="top" alt="icon"> Clean up your Web pages<br>
- with HTML TIDY</h1>
-
- <p align="center"><b>This version 7th July 1999</b></p>
-
- <p align="center"><small>Copyright © 1999 <a href=
- "http://www.w3.org">W3C</a>, see <a href="tidy.c">tidy.c</a> for
- copyright notice.</small></p>
-
- <blockquote>With many thanks to <a href="http://www.hp.com">
- Hewlett Packard</a> for financial support during the development
- of this software!</blockquote>
-
- <hr align="center" width="80%">
- <p align="center"><a href="#help">How to use Tidy</a> | <a href=
- "#download">Downloading Tidy</a> | <a href="release-notes.html">
- Release Notes</a><br>
- <a href="#quotes">Integration with other Software</a> | <a href=
- "#acks">Acknowledgements</a></p>
-
- <hr align="center" width="80%">
- <p>To get the latest version of Tidy please visit the original
- version of this page at: <a href=
- "http://www.w3.org/People/Raggett/tidy">
- http://www.w3.org/People/Raggett/tidy</a>. Courtesy of Netmind,
- you can register for email reminders when new versions of tidy
- become available.</p>
-
- <form method="GET" action=
- "http://www.netmind.com/cgi-bin/uncgi/url-mind">
- <center><input type="SUBMIT" value="Press Here to Register">
- </center>
- </form>
-
- <p>The public email list devoted to HTML Tidy is: <<a href=
- "mailto:html-tidy@w3.org">html-tidy@w3.org</a>>. To subscribe
- send an email to html-tidy-request@w3.org with the word subscribe
- in the subject line (include the word unsubscribe if you want to
- unsubscribe). The <a href=
- "http://lists.w3.org/Archives/Public/html-tidy/">archive</a> for
- this list is acccessible online. Please use this list to report
- errors or enhancement requests. See the <a href=
- "release-notes.html"><b>release notes</b></a> for information on
- recent changes. Your feedback is welcome!</p>
-
- <h3>Introduction to TIDY</h3>
-
- <p>When editing HTML it's easy to make mistakes. Wouldn't it be
- nice if there was a simple way to fix these mistakes
- automatically and tidy up sloppy editing into nicely layed out
- markup? Well now there is! Dave Raggett's HTML TIDY is a free
- utility for doing just that. It also works great on the
- atrociously hard to read markup generated by specialized HTML
- editors and conversion tools, and can help you identify where you
- need to pay further attention on making your pages more
- accessible to people with disabilities.</p>
-
- <p>Tidy is able to fix up a wide range of problems and to bring
- to your attention things that you need to work on yourself. Each
- item found is listed with the line number and column so that you
- can see where the problem lies in your markup. Tidy won't
- generate a cleaned up version when there are problems that it
- can't be sure of how to handle. These are logged as "errors"
- rather than "warnings".</p>
-
- <h3>Examples of TIDY at work</h3>
-
- <p>Tidy corrects the markup in a way that matches where possible
- the observed rendering in popular browsers from Netscape and
- Microsoft. Here are just a few examples of how TIDY perfects your
- HTML for you:</p>
-
- <ul>
- <li><b>Missing or mismatched end tags are detected and
- corrected</b>
-
- <pre>
- <h1>heading
- <h2>subheading</h3>
- </pre>
-
- <p>is mapped to</p>
-
- <pre>
- <h1>heading</h1>
- <h2>subheading</h2>
- </pre>
- </li>
-
- <li><b>End tags in the wrong order are corrected:</b>
-
- <pre>
- <p>here is a para <b>bold <i>bold italic</b> bold?</i> normal?
- </pre>
-
- <p>is mapped to</p>
-
- <pre>
- <p>here is a para <b>bold <i>bold italic</i> bold?</b> normal?
- </pre>
- </li>
-
- <li><b>Fixes problems with heading emphasis</b>
-
- <pre>
- <h1><i>italic heading</h1>
- <p>new paragraph
- </pre>
-
- <p>In Netscape and Internet Explorer this causes everything
- following the heading to be in the heading font size, not the
- desired effect at all!</p>
-
- <p>Tidy maps the example to</p>
-
- <pre>
- <h1><i>italic heading</i></h1>
- <p>new paragraph
- </pre>
- </li>
-
- <li><b>Recovers from mixed up tags</b>
-
- <pre>
- <i><h1>heading</h1></i>
- <p>new paragraph <b>bold text
- <p>some more bold text
- </pre>
-
- <p>Tidy maps this to</p>
-
- <pre>
- <h1><i>heading</i></h1>
- <p>new paragraph <b>bold text</b>
- <p><b>some more bold text</b>
- </pre>
- </li>
-
- <li><b>Getting the <hr> in the right place:</b>
-
- <pre>
- <h1><hr>heading</h1>
- <h2>sub<hr>heading</h2>
- </pre>
-
- <p>Tidy maps this to</p>
-
- <pre>
- <hr>
- <h1>heading</h1>
- <h2>sub</h2>
- <hr>
- <h2>heading</h2>
- </pre>
- </li>
-
- <li><b>Adding the missing "/" in end tags for anchors:</b>
-
- <pre>
- <a href="#refs">References<a>
- </pre>
-
- <p>Tidy maps this to</p>
-
- <pre>
- <a href="#refs">References</a>
- </pre>
- </li>
-
- <li><b>Perfecting lists by putting in tags missed out:</b>
-
- <pre>
- <body>
- <li>1st list item
- <li>2nd list item
- </pre>
-
- <p>is mapped to</p>
-
- <pre>
- <body>
- <ul>
- <li>1st list item</li>
- <li>2nd list item</li>
- </ul>
- </pre>
- </li>
-
- <li><b>Missing quotes around attribute values are added</b>
-
- <p>Tidy inserts quote marks around all attribute values for you.
- It can also detect when you have forgotten the closing quote
- mark, although this is something you will have to fix
- yourself.</p>
- </li>
-
- <li><b>Unknown/Proprietary attributes are reported</b>
-
- <p>Tidy has a comprehensive knowledge of the attributes defined
- in the HTML 4.0 recommendation from W3C. This often allows you to
- spot where you have mistyped an attribute or value.</p>
- </li>
-
- <li><b>Proprietary elements are recognized and reported as
- such.</b>
-
- <p>Tidy will even work out which version of HTML you are using
- and insert the appropriate DOCTYPE element, as per the W3C
- recommendations.</p>
- </li>
-
- <li><b>Tags lacking a terminating '>' are spotted</b>
-
- <p>This is something you then have to fix yourself as Tidy is
- unsure of where the > should be inserted.</p>
- </li>
- </ul>
-
- <h3>Layout style</h3>
-
- <p>You can choose which style you want Tidy to use when it
- generates the cleaned up markup: for instance whether you like
- elements to indent their contents or not. Several people have
- asked if Tidy could preserve the original layout. I am sorry to
- say that this would be very hard to support due to the way Tidy
- is implemented. Tidy starts by building a clean parse tree from
- the source file. The parse tree doesn't contain any information
- about the original layout. Tidy then pretty prints the parse tree
- using the current layout options. Trying to preserve the original
- layout would interact badly with the repair operations needed to
- build a clean parse tree and considerably complicate the
- code.</p>
-
- <p>Some browsers can screw up the right alignment of text
- depending on how you layout headings. As an example,
- consider:</p>
-
- <pre>
- <h1 align="right">
- Heading
- </h1>
-
- <h1 align="right">Heading</h1>
- </pre>
-
- <p>Both of these should be rendered the same. Sadly a common
- browser bug fails to trim trailing whitespace and misaligns the
- first heading. HTML Tidy will protect you from this bug, except
- when you set the indent option to "yes".</p>
-
- <p>Setting the indent option to yes can also cause problems with
- table layout for some browsers:</p>
-
- <pre>
- <td><img src="foo.gif"></td>
- <td><img src="foo.gif"></td>
- </pre>
-
- <p>will look slightly different from:</p>
-
- <pre>
- <td>
- <img src="foo.gif">
- </td>
- <td>
- <img src="foo.gif">
- </td>
- </pre>
-
- <p>You can avoid such quirks by using indent: no or
- indent: auto in the config file.</p>
-
- <h3>Internationalization issues</h3>
-
- <p>Tidy offers you a choice of character encodings: US ASCII, ISO
- Latin-1, UTF-8 and the ISO 2022 family of 7 bit encodings. The
- full set of HTML 4.0 entities are defined. Cleaned up output uses
- HTML entity names for characters when appropriate. Otherwise
- characters outside the normal range are output as numeric
- character entities. Tidy defaults to assuming you want the output
- to be in US ASCII. Tidy doesn't yet recognize the use of the HTML
- meta element for specifying the character encoding.</p>
-
- <h3>Accessibility</h3>
-
- <p>Tidy offers advice on accessibility problems for people using
- non-graphical browsers. The most common thing you will see is the
- suggestion you add a summary attribute to table elements. The
- idea is to provide a summary of the table's role and structure
- suitable for use with aural browsers.</p>
-
- <h3>Cleaning up presentational markup</h3>
-
- <p>Many tools generate HTML with an excess of FONT, NOBR and
- CENTER tags. Tidy's <em>-clean</em> option will replace them by
- style properties and rules using CSS. This makes the markup
- easier to read and maintain as well as reducing the file size!
- Tidy is expected to get smarter at this in the future.</p>
-
- <p>Some pages rely on the presentation effects of isolated
- <p> or </p> tags.Tidy deletes empty paragraph and
- heading elements etc. The use of empty paragraph elements is not
- recommended for adding vertical whitespace. Instead use style
- sheets, or the <br> element. Tidy won't discard paragraphs
- only containg a nonbreaking space </p>
-
- <h3>Teaching Tidy about new tags!</h3>
-
- <p>You can teach Tidy about new tags by declaring them in the
- configuration file, the syntax is:</p>
-
- <pre>
- new-inline-tags: <em>tag1, tag2, tag3</em>
- new-empty-tags: <em>tag1, tag2, tag3</em>
- new-blocklevel-tags: <em>tag1, tag2, tag3</em>
- </pre>
-
- <p>Note that the new tags can only appear where Tidy expects
- inline or block-level tags respectively. This means you can't
- (yet) place new tags within the document head or other contexts
- with restricted content models. So far the most popular use of
- this feature is to allow Tidy to be applied to Cold Fusion
- files.</p>
-
- <p><i>I am working on ways to make it easy to customize the
- permitted document syntax using <a href=
- "http://www.w3.org/People/Raggett/dtdgen/Docs">assertion
- grammars</a>, and hope to apply this to a much smarter version of
- Tidy for release later this year.</i></p>
-
- <h3>Limited support for ASP</h3>
-
- <p>Tidy is somewhat aware of the preprocessing language called
- ASP which uses a psuedo element syntax <% ... %>
- to include preprocessor directives. ASP is normally interpreted
- by the web server before delivery to the browser. Tidy will cope
- with ASP psuedo elements within element content and as
- replacements for attributes, for example:</p>
-
- <pre>
- <option <% if rsSchool.Fields("ID").Value
- = session("sessSchoolID")
- then Response.Write("selected") %>
- value='<%=rsSchool.Fields("ID").Value%>'>
- <%=rsSchool.Fields("Name").Value%>
- (<%=rsSchool.Fields("ID").Value%>)
- </option>
- </pre>
-
- <p>Note that Tidy doesn't understand the scripting language used
- within ASP, and can easily get confused. Tidy may report missing
- attributes when these are hidden within ASP code. Tidy can also
- get things wrong if the ASP code includes quotemarks, e.g. if the
- example above is changed to:</p>
-
- <pre>
- value="<%=rsSchool.Fields("ID").Value%>"
- </pre>
-
- <p>Tidy will now see the quotemark preceding ID as ending the
- attribute value, and proceed to complain about what follows. Note
- you can choose whether to allow line wrapping on spaces within
- ASP pseudo elements or not using the <tt>wrap-asp</tt>
- option.</p>
-
- <h3>Support for XML</h3>
-
- <p>XML processors compliant with W3C's XML 1.0 recommendation are
- very picky about which files they will accept. Tidy can help you
- to fix errors that cause your XML files to be rejected. Tidy
- doesn't yet recognize all XML features though, e.g. it doesn't
- yet understand CDATA sections or DTD subsets.</p>
-
- <h3>Creating Slides</h3>
-
- <p>The <em>-slides</em> option allows you to burst a single HTML
- file into a number of linked slides. Each H2 element in the input
- file is treated as delimiting the start of the next slide. The
- slides are named slide1.html, slide2.html, slide3.html etc. This
- is a relatively new feature and ideas are welcomed as to how to
- improve it. In particular, I plan to add support to the
- configuration file for setting the style sheet for slides and for
- customizing the slides via a template.</p>
-
- <p>I would be interested in hearing from anyone who can offer
- help with using Javascript for adding dynamic effects to slides,
- for instance similar to those available in Microsoft
- PowerPoint.</p>
-
- <h3>Indenting text for a better layout</h3>
-
- <pre>
- <html>
- <head>
- </head>
- <body>
- <p>
- para which has enough text to cause a line break, and so test
- the wrapping mechanism for long lines.
- </p>
- <pre>This is
- <em>genuine
- preformatted</em>
- text
- </pre>
- <ul>
- <li>
- 1st list item
- </li>
- <li>
- 2nd list item
- </li>
- </ul>
- <!-- end comment -->
- </body>
- </html>
- </pre>
-
- <p>and this is the default style:</p>
-
- <pre>
- <html>
- <head>
- </head>
- <body>
- <p>para which has enough text to cause a line break, and so test
- the wrapping mechanism for long lines.</p>
-
- <pre>This is
- <em>genuine
- preformatted</em>
- text
- </pre>
-
- <ul>
- <li>1st list item </li>
-
- <li>2nd list item</li>
- </ul>
-
- <!-- end comment -->
- </body>
- </html>
-
- </pre>
-
- <h3><a name="help">How to run tidy</a></h3>
-
- <pre>
- <font color=
- "maroon">tidy</font> <em>[[options] filename]*</em>
- </pre>
-
- <p>HTML tidy is not (yet) a windows program. If you run tidy
- without any arguments, it will just sit there waiting to read
- markup on the stdin stream. Tidy's input and output default to
- stdin and stdout respectively. Errors are written to stderr but
- can be redirected to a file with the -f <em>filename</em>
- option.</p>
-
- <p>I generally use the -m option to get tidy to update the
- original file, and if the file is particularly bad I also use the
- -f option to write the errors to a file to make it easier to
- review them. Tidy supports a small set of character encoding
- options. The default is ASCII, which makes it easy to edit markup
- in regular text editors.</p>
-
- <p>For instance:</p>
-
- <pre>
- tidy -f errs.txt -m index.html
- </pre>
-
- <p>which runs tidy on the file "index.html" updating it in place
- and writing the error messages to the file "errs.txt". Its a good
- idea to save your work before tidying it, as with all complex
- software, tidy may have bugs. If you find any please let me
- know!</p>
-
- <p>Users running in Microsoft Windows should be aware that Dos
- doesn't expand wild cards in filenames. This means that if you
- have several html files in the same directory and want to tidy
- all of them:</p>
-
- <pre>
- tidy *.html
- </pre>
-
- <p>won't work. You will see an error message: "can't open file
- *.html". Instead you need to run tidy separately on each one. I
- will look into a fix for this for a future release. A work around
- is to use the DOS <em>for</em> command, as in:</p>
-
- <pre>
- for %i in (*.html) do tidy %i
- </pre>
-
- <p>Note: in a batch file that needs to be %%i instead of %i</p>
-
- <p>Tidy writes errors to stderr, and won't be paused by the more
- command. A work around is to redirect stderr to stdout as
- follows. This works on Unix and Windows NT, but not on other
- platforms. My thanks to Markus Wolf for this tip!</p>
-
- <pre>
- tidy file.html 2>&1 | more
- </pre>
-
- <h4>Tidy's Options</h4>
-
- <p>To get a list of available options use:</p>
-
- <pre>
- tidy -help
- </pre>
-
- <p>You should see something like this:</p>
-
- <pre>
- options for tidy vers: 14th April 1999
-
- <font color=
- "maroon">-config <em>file</em></font> read config <em>file</em>
- <font color="maroon">-indent</font> <i>or</i> <font color=
- "maroon">-i</font> indent element content
- <font color="maroon">-omit</font> <i>or</i> <font color=
- "maroon">-o</font> omit optional endtags
- <font color=
- "maroon">-wrap 72</font> wrap text at column 72 (default is 68)
- <font color="maroon">-upper</font> <i>or</i> <font color=
- "maroon">-u</font> force tags to upper case (default is lower)
- <font color="maroon">-clean</font> <i>or</i> <font color=
- "maroon">-c</font> replace font, nobr & center tags by CSS
- <font color=
- "maroon">-raw</font> don't o/p entities for chars 128 to 255
- <font color=
- "maroon">-ascii</font> use ASCII for output, Latin-1 for input
- <font color=
- "maroon">-latin1</font> use Latin-1 for both input and output
- <font color=
- "maroon">-utf8</font> use UTF-8 for both input and output
- <font color=
- "maroon">-iso2022</font> use ISO2022 for both input and output
- <font color="maroon">-numeric</font> <i>or</i> <font color=
- "maroon">-n</font> output numeric rather than named entities
- <font color="maroon">-modify</font> <i>or</i> <font color=
- "maroon">-m</font> to modify original files
- <font color="maroon">-errors</font> <i>or</i> <font color=
- "maroon">-e</font> show only error messages
- <font color=
- "maroon">-f <em>file</em></font> write errors to <em>file</em>
- <font color=
- "maroon">-xml</font> use this when input is in XML
- <font color=
- "maroon">-asxml</font> to convert HTML to XML
- <font color=
- "maroon">-slides</font> to burst into slides on h2 elements
- <font color=
- "maroon">-help</font> list command line options
- </pre>
-
- <p>Input and Output default to stdin/stdout respectively. Single
- letter options apart from -f may be combined as in: tidy -f
- errs.txt -imu foo.html</p>
-
- <p><i>A future extension under consideration would allow any of
- the config file options to also be use on the command line, using
- -- as the prefix for the option name. Unfortunately, I don't have
- time to implement this for this release. My thanks to Jochen M.
- Braun for the suggestion.</i></p>
-
- <h3><a name="config">Using a Configuration File</a></h3>
-
- <p>Tidy now supports a configuration file, and this is now much
- the most convenient way to configure Tidy. Assuming you have
- created a config file named "config.txt" (the name doesn't
- matter), you can instruct Tidy to use it via the command line
- option <tt>-config config.txt</tt>, e.g.</p>
-
- <pre>
- tidy -config config.txt file1.html file2.html
- </pre>
-
- <p>Alternatively, you can name the default config file via the
- environment variable named "HTML_TIDY". Note this should be the
- absolute path since you are likely to want to run Tidy in
- different directories. You can also set a config file at compile
- time by defining CONFIG_FILE as the path string, see
- platform.h.</p>
-
- <p>The following options are supported:</p>
-
- <dl>
- <dt>markup: <em>bool</em></dt>
-
- <dd>Determines whether Tidy generates a pretty printed version of
- the markup. Bool values are either <em>yes</em> or <em>no</em>.
- Note that Tidy won't generate a pretty printed version if it
- finds unknown tags, or missing trailing quotes on attribute
- values, or missing trailing '>' on tags. The default is <em>
- no</em>.</dd>
-
- <dt>wrap: <em>number</em></dt>
-
- <dd>Sets the right margin for line wrapping. Tidy tries to wrap
- lines so that they do not exceed this length. The default is
- 66.</dd>
-
- <dt>tab-size: <em>number</em></dt>
-
- <dd>Sets the number of columns between successive tab stops. The
- default is 4. It is used to map tabs to spaces when reading
- files. Tidy never outputs files with tabs.</dd>
-
- <dt>indent: <em>no, yes</em> or <em>auto</em></dt>
-
- <dd>If set to <em>yes</em> Tidy will indent block-level tags. The
- default is <em>no</em>. If set to <em>auto</em> Tidy will decide
- whether or not to indent the content of tags such as h1-h6, li,
- or p depending on whether or not the content includes a
- block-level element.</dd>
-
- <dt>indent-spaces: <em>number</em></dt>
-
- <dd>Sets the number of spaces to indent content when indentation
- is enabled. The default is 2 spaces.</dd>
-
- <dt>indent-attributes: <em>bool</em></dt>
-
- <dd>If set to <em>yes</em>, each attribute will begin on a new
- line. The default is <em>no</em>.</dd>
-
- <dt>hide-endtags: <em>bool</em></dt>
-
- <dd>If set to <em>yes</em>, optional end-tags will be omitted
- when generating the pretty printed markup. This option is ignored
- if you are outputting to XML. The default is <em>no</em>.</dd>
-
- <dt>input-xml: <em>bool</em></dt>
-
- <dd>If set to <em>yes</em>, Tidy will use the XML parser rather
- than the error correcting HTML parser. The default is <em>
- no</em>.</dd>
-
- <dt>output-xml: <em>bool</em></dt>
-
- <dd>If set to <em>yes</em>, Tidy will use generate the pretty
- printed output writing it as well-formed XML. Any entities not
- defined in XML 1.0 will be written as numeric entities to allow
- them to be parsed by an XML parser. The tags and attributes will
- be in the case used in the input document, regardless of other
- options. The default is <em>no</em>.</dd>
-
- <dt>add-xml-pi: <em>bool</em></dt>
-
- <dd>If set to <em>yes</em>, Tidy will use add the XML processing
- instruction when outputting XML or XHTML. The default is <em>
- yes</em>. Note that if the input document includes an XML PI,
- then it will appear in the output independent of the value of
- this option.</dd>
-
- <dt>output-xhtml: <em>bool</em></dt>
-
- <dd>If set to <em>yes</em>, Tidy will use generate the pretty
- printed output writing it as extensible HTML. The default is <em>
- no</em>. This option causes Tidy to set the doctype and default
- namespace as appropriate to XHTML. If a doctype or namespace is
- given they will checked for consistency with the content of the
- document. In the case of an inconsistency, the corrected values
- will appear in the output. For XHTML, entities can be written as
- named or numeric entities according to the value of the
- "numeric-entities" property. he tags and attributes will be
- output in the case used in the input document, regardless of
- other options.</dd>
-
- <dt>doctype: <em>omit, auto, strict, loose</em> or
- <<em>fpi</em>></dt>
-
- <dd>This property controls the doctype declaration generated by
- Tidy. If set to <em>omit</em> the output file won't contain a
- doctype declaration. If set to <em>auto</em> (the default) Tidy
- will use an educated guess based upon the contents of the
- document. If set to <em>strict</em>, Tidy will set the doctype to
- the strict DTD. If set to <em>loose</em>, the doctype is set to
- the loose (transitional) DTD. Alternatively, you can supply a
- string for the formal public identifier (fpi) for example:</dd>
-
- <dd>
- <pre>
- doctype: "-//ACME//DTD HTML 3.14159//EN"
- </pre>
- </dd>
-
- <dd>If you specify the fpi for an XHTML document, Tidy will set
- the system identifier to the empty string. Tidy leaves the
- document type for generic XML documents unchanged.</dd>
-
- <dt>char-encoding: <em>raw, ascii, latin1, utf8</em> or <em>
- iso2022</em></dt>
-
- <dd>Determines how Tidy interprets character streams. For <em>
- ascii</em>, Tidy will accept Latin-1 character values, but will
- use entities for all characters whose value > 127. For <em>
- raw</em>, Tidy will output values above 127 without translating
- them into entities. For <em>latin1</em> characters above 255 will
- be written as entities. For <em>utf8</em>, Tidy assumes that both
- input and output is encoded as UTF-8. You can use <em>
- iso2022</em> for files encoded using the ISO2022 family of
- encodings e.g. ISO 2022-JP. The default is <em>ascii</em></dd>
-
- <dt>numeric-entities: <em>bool</em></dt>
-
- <dd>Causes entities other than the basic XML 1.0 named entities
- to be written in the numeric rather than the named entity form.
- The default is <em>no</em></dd>
-
- <dt>quote-marks: <em>bool</em></dt>
-
- <dd>If set to yes, this causes " characters to be written out as
- " as is preferred by some editing environments. The
- apostrophe character ' is written out as ' since many web
- browsers don't yet support '. The default is <em>
- no</em>.</dd>
-
- <dt>quote-nbsp: <em>bool</em></dt>
-
- <dd>If set, this causes non-breaking space characters to be
- written out as enities. The default is <em>yes</em>.</dd>
-
- <dt>quote-ampersand: <em>bool</em></dt>
-
- <dd>If set to yes, this causes unadorned & characters to be
- written out as &. The default is <em>yes</em>.</dd>
-
- <dt>fix-backslash: <em>bool</em></dt>
-
- <dd>If set to yes, this causes backslash characters "\" in URLs
- to be replaced by forward slashes "/". The default is <em>
- yes</em>.</dd>
-
- <dt>wrap-script-literals: <em>bool</em></dt>
-
- <dd>If set to yes, this allows lines to be wrapped within string
- literals that appear in script attributes. The default is <em>
- no</em>. The example shows how Tidy wraps a really really long
- script string literal inserting a backslash character before the
- linebreak:
-
- <pre>
- <a href="somewhere.html" onmouseover="document.status = '...some \
- really, really, really, really, really, really, really, really, \
- really, really long string..';">test</a>
- </pre>
- </dd>
-
- <dt>wrap-asp: <em>bool</em></dt>
-
- <dd>If set to no, this prevents lines from being wrapped within
- ASP psuedo elements. The default is <em>yes</em>.</dd>
-
- <dt>break-before-br: <em>bool</em></dt>
-
- <dd>If set, Tidy will output a line break before each <br>
- element. The default is <em>no</em>.</dd>
-
- <dt>uppercase-tags: <em>bool</em></dt>
-
- <dd>Causes tag names to be output in upper case. The default is
- <em>no</em> resulting in lowercase, except for XML input where
- the original case is preserved.</dd>
-
- <dt>uppercase-attributes: <em>bool</em></dt>
-
- <dd>Causes attribute names to be output in upper case. The
- default is <em>no</em> resulting in lowercase, except for XML
- where the original case is preserved.</dd>
-
- <dt>clean: <em>bool</em></dt>
-
- <dd>If set, causes Tidy to strip out surplus presentational tags
- and attributes replacing them by style rules and structural
- markup as appropriate. It works well on the html saved from
- Microsoft Office'97. I hope to work on cleaning up after Office
- 2000 in a future release. The default is <em>no</em>.</dd>
-
- <dt>drop-font-tags: <em>bool</em></dt>
-
- <dd>If set together with the clean option (see above), Tidy will
- discard font and center tags rather than creating the
- corresponding style rules. The default is <em>no</em>.</dd>
-
- <dt>write-back: <em>bool</em></dt>
-
- <dd>If set, Tidy will write back the tidied markup to the same
- file it read from. The default is <em>no</em>. You are advised to
- keep copies of important files before tidying them as on rare
- occasions the result may not always be what you expect.</dd>
-
- <dt>error-file: <em>filename</em></dt>
-
- <dd>Writes errors and warnings to the named file rather than to
- stderr.</dd>
-
- <dt>show-warnings: <em>bool</em></dt>
-
- <dd>If set to no, warnings are suppressed. This can be useful
- when a few errors are hidden in a flurry of warnings. The default
- is <em>yes</em>.</dd>
-
- <dt>split: <em>bool</em></dt>
-
- <dd>If set to <em>yes</em> Tidy will use the input file to create
- a sequence of slides, splitting the markup prior to each
- successive <h2>. You can see an example of the results in a
- <a href="http://www.w3.org/Talks/1999/03/23-stockholm-xhtml">
- recent talk I made on XHTML</a>. The slides are written to
- "slide1.html", "slide2.html" etc. The default is <em>
- no</em>.</dd>
-
- <dt>new-empty-tags: <em>tag1, tag2, tag3</em></dt>
-
- <dd>Use this to declare new empty inline tags. The option takes a
- space or comma separated list of tag names. Unless you declare new
- tags, Tidy will refuse to generate a tidied file if the input
- includes previously unknown tags.</dd>
-
- <dt>new-inline-tags: <em>tag1, tag2, tag3</em></dt>
-
- <dd>Use this to declare new non-empty inline tags. The option takes
- a space or comma separated list of tag names. Unless you declare
- new tags, Tidy will refuse to generate a tidied file if the input
- includes previously unknown tags</dd>
-
- <dt>new-blocklevel-tags: <em>tag1, tag2, tag3</em></dt>
-
- <dd>Use this to declare new block-level tags. The option takes a
- space or comma separated list of tag names. Unless you declare
- new tags, Tidy will refuse to generate a tidied file if the input
- includes previously unknown tags. Note you can't as yet add new
- empty elements (similar to hr) and you can't change the content
- model for elements such as table, ul, ol and dl. This is
- explained in more detail in the <a href="release-notes.html">
- release notes</a></dd>
- </dl>
-
- <h4>Sample Config File</h4>
-
- <pre>
- // sample config file for HTML tidy
- indent: auto
- indent-spaces: 2
- wrap: 72
- markup: yes
- clean: yes
- output-xml: no
- input-xml: no
- show-warnings: yes
- numeric-entities: yes
- quote-marks: yes
- quote-nbsp: yes
- quote-ampersand: no
- break-before-br: no
- uppercase-tags: no
- uppercase-attributes: no
- output-xhtml: yes
- char-encoding: latin1
- </pre>
-
- <h3><a name="scripts">Using Tidy from scripts</a></h3>
-
- <p>If you want to run Tidy from a Perl or other scripting
- language you may find it of value to inspect the result returned
- by Tidy when it exits: 0 if everything is fine, 1 if there were
- warnings and 2 if there were errors. This is an example using
- Perl:</p>
-
- <pre>
- if (close(TIDY) == 0) {
- my $exitcode = $? >> 8;
- if ($exitcode == 1) {
- printf STDERR "tidy issued warning messages\n";
- } elsif ($exitcode == 2) {
- printf STDERR "tidy issued error messages\n";
- } else {
- die "tidy exited with code: $exitcode\n";
- }
- } else {
- printf STDERR "tidy detected no errors\n";
- }
- </pre>
-
- <h3><a name="download">Downloadable Binaries</a></h3>
-
- <p class="note">If you are prepared to maintain a public URL for
- HTML Tidy compiled for a specific platform, please let me know so
- that I can add a link to your page. This will avoid the need for
- me to update this page whenever you recompile.</p>
-
- <p><b><a href="http://www.chami.com/free/html-kit/">Windows
- users</a></b>! A free graphical user interface (HTML-Kit) for
- HTML Tidy is now available for windows 95/98/NT. Alternatively,
- you can get tidy in its native form as a Windows (win32) console
- program: <a href="http://www.w3.org/People/Raggett/tidy.exe"><b>
- tidy.exe</b></a>, with the command options as per above. A
- version of Tidy for Windows 3.11 is in preparation.</p>
-
- <p><b><a href=
- "http://www.geocities.com/SiliconValley/1057/tidy.html">Mac
- users</a></b>! You can now run <a href=
- "http://www.geocities.com/SiliconValley/1057/tidy.html">HTML Tidy
- with FilterTop</a> (<a href=
- "http://www.geocities.com/SiliconValley/1057/images/TidyHTML.GIF">
- Screenshot</a>), or as a command line interface application. My
- thanks to <a href="mailto:teague@macbroker.com">Terry Teague</a>
- for this port.</p>
-
- <p><b><a href=
- "http://www.amiga.u-net.com/MadDogSoftware/Tidy.html">Amiga
- users</a></b>! Keith Blakemore-Noble has compiled Tidy for the
- Amiga.</p>
-
- <p><b><a href=
- "http://www-frec.bull.com/cgi-bin/list_dir.cgi/download/">AIX
- executable for Tidy</a></b>! Compiled by Ciaran Deignan. The link
- is to a general download page. The executable is available for
- AIX 4.3.2 and later.</p>
-
- <p><b><a href="http://perso.club-internet.fr/dpo/rpm/">Tidy RPM
- Package</a></b> for Redhat Linux, maintained by <i>Dimitri
- Papadopoulos</i>. Tidy may also be available from other Linux
- distribution sites, e.g. <a href="http://rpmfind.net/">
- http://rpmfind.net/</a></p>
-
- <!-- no longer accessible :-(
- <p><b><a href=
- "http://www.astro.uni-bonn.de/~webstw/cm/w3c_tidy/index.html">
- Linux users</a></b>! ochen M. Braun is maintaining Tidy binary
- for Linux (ELF 32-bit LSB executable using '<tt>libc.so.5</tt>'
- for Intel 80386): '<a href=
- "ftp://ftp.astro.uni-bonn.de/pub/webstw/linsoft/tidy"><tt>tidy</tt></a>
- '. Additionally a man page can be downloaded: <a href=
- "ftp://ftp.astro.uni-bonn.de/pub/webstw/linsoft/tidy.1"><tt>
- tidy.1</tt></a>.</p>
- -->
-
- <p><b><a href="http://www.ocston.org/~simon/tidy/">Tidy for
- UnixWare</a></b>! <a href="mailto:simon@ocston.org">Simon
- Trimmer</a> is maintaining a Tidy binary for Unixware.</p>
-
- <p><b><a href="http://members.xoom.com/nickbeee/tidy386/">
- Tidy386</a></b> for DOS, maintained by <i>Nick B</i>. This
- exploits the DPMI mechanism for the memory management.</p>
-
- <h3><a name="quotes">Integrating Tidy as part of other
- Software</a></h3>
-
- <p>You can also incorporate Tidy as part of a larger program, for
- instance in HTML editors or HTML transformation tools used for
- import filters, or for when you want to customize Web content to
- get the best out of different kinds of browsers. Imagine
- authoring clean HTML with CSS and at a touch of a button
- producing variants that look great and work reliably on a large
- variety of different browsers, taking into account the quirks of
- each. For instance, providing the ability to tune content for
- different versions of Netscape and Internet Explorer, and for
- browsers running on set-top boxes for televisions, handheld and
- palmtop devices, cellphones, and voice browsers. I am happy to
- quote for software development for such tools.</p>
-
- <h3><a name="java">Java port of HTML Tidy</a></h3>
-
- <p><a href="mailto:ac.quick@sympatico.ca">Andy Quick</a> has
- ported Tidy to Java, so that you can now integrate Tidy into your
- Java applications. More information is available on <a href=
- "http://www3.sympatico.ca/ac.quick">Andy's home page</a>. Andy
- writes:</p>
-
- <blockquote><i>It uses some nice features of Java, such as
- resource bundles (for internationalization). The current state of
- the project is: I have ran a few simple tests, and it seems to
- work. In other words, it is beta software :-) I still have to
- implement the clean functionality (from clean.c) and do a lot
- more testing. I would be willing to bring the Java version to a
- stable version. After that, I may be interested in supporting it,
- but I could at least pass it on to somebody who is interested in
- supporting it.</i></blockquote>
-
- <h3><a name="implementation">Implementation details</a></h3>
-
- <p>The code is in ANSI C and uses the C standard library for i/o.
- The parser works top down, building a complete parse tree in
- memory. Document text is held as Unicode represented as UTF-8 in
- a character buffer that expands as needed. The code has so far
- been tested on Windows'95, Windows'98, Windows NT, Linux,
- FreeBSD, NetBSD, Ultrix, OSF, OS/MP, IRIX, NeXtStep, MacOS, BeOS,
- OS2, AIX, Amiga, SunOS, Solaris, IRIX and HP-UX, amongst
- others.</p>
-
- <dl>
- <dt><a href="../tidy7jul99.tgz">tidy7jul99.tgz</a></dt>
-
- <dd>gzipped tar file for source code (Unix line ends)</dd>
-
- <dt><a href="../tidy7jul99.zip">tidy7jul99.zip</a></dt>
-
- <dd>zipped source code (Windows line ends)</dd>
-
- <dt><a href="http://www.w3.org/People/Raggett/tidy.exe">
- tidy.exe</a></dt>
-
- <dd>Windows 95/NT executable (32-bit Windows console-mode
- program)</dd>
-
- <dt><a href=
- "http://www.w3.org/People/Raggett/tidy17dec98.ppc.tgz">
- tidy17dec98.ppc.tgz</a></dt>
-
- <dd>Gzipped archive of the binary for BeOS PPC R4. It also
- contains complete tidy distribution and Makefile.BeOS file for
- BeOS (from 17dec98 release of tidy).</dd>
-
- <dt><a href=
- "http://www.dd.iij4u.or.jp/~kshimz/warp/tidy/tidy.zip">Tidy on
- OS/2</a></dt>
-
- <dd>Zipped archive of the OS/2 release of tidy, as compiled by
- Kaz SHiMZ <<a href=
- "mailto:kshimz@sfc.co.jp">kshimz@sfc.co.jp</a>></dd>
-
- <dt><a href="platform.h">platform.h</a>, <a href="html.h">
- html.h</a></dt>
-
- <dd>the include files with common definitions</dd>
-
- <dt><a href="config.c">config.c</a></dt>
-
- <dd>support for customizing Tidy via config files</dd>
-
- <dt><a href="lexer.c">lexer.c</a></dt>
-
- <dd>lexical analysis and buffer management</dd>
-
- <dt><a href="parser.c">parser.c</a></dt>
-
- <dd>HTML and XML parsers</dd>
-
- <dt><a href="tags.c">tags.c</a></dt>
-
- <dd>dictionary of tags and their properties</dd>
-
- <dt><a href="attrs.c">attrs.c</a></dt>
-
- <dd>dictionary of attributes and their properties</dd>
-
- <dt><a href="istack.c">istack.c</a></dt>
-
- <dd>stack of active inline elements</dd>
-
- <dt><a href="entities.c">entities.c</a></dt>
-
- <dd>dictionary of entities</dd>
-
- <dt><a href="clean.c">clean.c</a></dt>
-
- <dd>smarts for cleaning up presentational markup</dd>
-
- <dt><a href="pprint.c">pprint.c</a></dt>
-
- <dd>pretty printing for HTML and XML</dd>
-
- <dt><a href="localize.c">localize.c</a></dt>
-
- <dd>Change this file to localize tidy's messages</dd>
-
- <dt><a href="tidy.c">tidy.c</a></dt>
-
- <dd>main() and error reporting routines</dd>
-
- <dt><a href="Makefile">Makefile</a></dt>
-
- <dd>Makefile for gcc</dd>
- </dl>
-
- <p>Conventions for whether lines end with CRLF, LF or CR vary
- from one system to another. I have included the C source for a
- utility <b>tab2space</b> which can be used to ensure that files
- use the line end convention of your choice, and to expand tabs to
- spaces.</p>
-
- <pre>
- tab2space -t4 -unix *.h *.c
- tab2space -tabs -unix Makefile
- </pre>
-
- <p>Note use of "-tabs" to ensure that tabs are preserved in the
- Makefile (it won't work without them!).</p>
-
- <p>For those of you on Unix, here is a script you can use to
- strip carriage returns:</p>
-
- <pre>
- #!/bin/sh
- echo Stripping Carriage Returns from files...
- for i
- do
- # If a writable file
- if [ -f $i ]
- then
- if [ -w $i ]
- then
- echo $i
- # strip CRs from input and output to temp file
- tr -d '\015' < $i > toix.tmp
- mv toix.tmp $i
- else
- echo $i: write-protected
- fi
- else
- echo $i: not a file
- fi
- done
- </pre>
-
- <p>Save this script to a file, e.g. "<em>scripcr</em>" and use
- "<em>chmod +x stripcr</em>" to make it executable. You can then
- run it as "<em>stripcr *.c *.h Overview.html Makefile</em>"</p>
-
- <h2><a name="acks">Acknowledgements</a></h2>
-
- <p>I would like to thank the many people who have written to me
- with suggestions for improvements or reporting bugs. Your help
- has been invaluable.</p>
-
- <blockquote class="people">Drew Adams, Osma Ahvenlampi, Carsten
- Allefeld, Jacob Sparre Andersen, Joe D'Andrea, Jerry Andrews,
- Bruce Aron, Nick B, Chang Hyun Baek, Nick B, Chuck Baslock,
- Christer Bernerus, Alexander Biron, Keith Blakemore-Noble, Eric
- Blossom, ochen M. Braun, David Brooke, Andy Brown, Keith B.
- Brown, Andreas Buchholz, Maurice Buxton, Jelks Cabaniss, Trevor
- Carden, Terry Cassidy, Mathew Cepl, Kendall Clark, Jeremy Clulow,
- Dan Connolly, Ken Cox, Keith Davies, Ciaran Deignan, Bodo Eing,
- David Fallon, Claus André Färber, Stephanie Foott,
- Darren Forcier, Frederik Fouvry, Rene Fritz, Martin Gallwey, Francisco
- Guardiola, David Getchell, Michael Giroux, Guus Goos, Léa Gris,
- Francisco Guardiola, Juha Häikiö, Bjoern Hoehrmann, G.
- Ken Holman, Bill Homer, Craig Horman, Jack Horsfield, Marc
- Jauvin, Rick Jelliffe, Peter Jeremy, Craig Johnson, Charles
- LaFountain, Steven Lobo, Zdenek Kabelac, Michael Kay, Axel
- Kielhorn, Johannes Koch, Rudy Kohut, Allan Kuchinsky, Volker
- Kuhlmann, Steve Lee, Tony Leneis, Nick Leverton, Dietmar Lippold,
- Gert-Jan C. Lokhorst, Anton Marsden, Bede McCall, Shane McCarron,
- Ian McKellar, Chris Nappin, Ann Navarro, Allan Odgaard, Matt
- Oshry, Gerald Oskoboiny, Paul Ossenbruggen, Ernst Paalvast,
- Christian Pantel, Dimitri Papadopoulos, Steven Pemberton, Lee
- Anne Phillips, Xavier Plantefeve, Karl Prinz, Andy Quick, Ross L.
- Richardson, Philip Riebold, Erik Rossen, Dan Rudman, Christian
- Ruetgers, Klaus Johannes Rusch, Eric Schindler, J. Schlauch,
- Christian Schüler, Klaus Alexander Seistrup, Jim Seymour,
- Kazuyoshi Shimizu, Geoff Sinclair, Jo Smith, Paul Smith, Steve
- Spilker, Rafi Stern, Michael J. Suzio, Oren Tirosh, John Tobler,
- Loïc Trégan, Simon Trimmer, Steffen Ullrich, Stuart
- Updegrave, Charles A. Upsdell, Jussi Vestman, Larry W. Virden,
- Daniel Vogelheim, Jez Wain, Paul Ward, Jeff Young, Christian
- Zuckschwerdt</blockquote>
-
- <p><small><a href="http://www.w3.org/People/Raggett">Dave
- Raggett</a> <<a href="mailto:dsr@w3.org">dsr@w3.org</a>> is
- an engineer from <a href="http://www.hp.com/">Hewlett
- Packard</a>'s <a href="http://www.hpl.hp.co.uk">UK
- Laboratories</a>, and works on assignment to the World Wide Web
- Consortium, where he is the W3C lead for HTML, Math and Voice
- Browsers.</small></p>
- </body>
- </html>
-
-